Data Transformation using Unity Catalog

In this topic we will discuss how to create a data transformation pipeline using Unity Catalog-enabled Databricks for data transformation and Databricks Unity Catalog as data lake.

Prerequisites

Ensure that you complete the following prerequisites, before you create a data transformation job:

  • Access to a Databricks node that has Unity Catalog enabled which will be used as a data transformation node in the pipeline. The Databricks Runtime version of the cluster must be 14.3 LTS or later.The access mode must be dedicated or standard.

  • Access to a Databricks Unity Catalog node which will be used as a data lake in the pipeline.

Creating a data transformation job

  1. On the home page of Data Pipeline Studio, add the following stages and connect them as shown below:

    • Data Transformation (Databricks - Unity Catalog enabled)

    • Data Lake (Databricks Unity Catalog)

    Unity Catalog Data Transformation pipeline

  2. Configure the data lake node.

    • Click the dropdown Use an existing Databricks Unity Catalog, select an instance. Click Add to data pipeline.

    • Click the dropdown Schema Name and select a schema.

    • Click Data Browsing. Browse the folders and view the required files. This step is optional.

    • Click Save.

  1. Click the data transformation node. Do the following:

    • Select one of the following options that you want to use for the data transformation job:

      • Spark Cluster

      • SQL Warehouse

      See Spark Cluster or SQL Warehouse

    • Click Create Templatized Job.

    Complete the following steps to create the job:

Related Topics Link IconRecommended Topics What's next? Snowflake Custom Transformation Job